ENV 859 - Fall 2022
© John Fay, Duke University
In this notebook, we examine how GeoPandas is used in peforming a spatial analysis. We take an example of looking at the demographic characteristics of where electric vehicle (EV) charging stations have been located in Durham, Wake, and Orange counties with respect to 2010 Census and Social Vulnerability Index values. (It's not a very sensible analysis as done here, but we are concentrating on the mechanics more than the utility of the analysis...)
Here we'll gather and explore the data we'll be using in our analysis. This includes two datasets. First is the list of EV Charging locations, stored as a CSV file in our data folder. This dataset has coordinate columns that we'll use to construct points and convert into a geodataframe as we learned in our previous lessons.
The second dataset is comprised of 2010 Census BlockGroup data for all of North Carolina. We'll fetch these data from an on line resource using a web service. We'll revisit how web services later; for now, we'll use this process to fetch data for three counties: Durham, Wake, and Orange.
For each dataset, we'll get the data into geodataframe format and then explore the data in various ways. Then we'll move to Part 2 where we analyse the data.
#Import packages
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
#Command to run if autocomplete is slooooowwww
%config Completer.use_jedi = False
→ Can you explain what role each package imported might do in our analysis?
As done in a previous notebook, we want to:
#Read in charging stations CSV, convert to geodataframe
df = pd.read_csv('../data/NC_Charging_Stations.csv')
#Create the geoseries from the X and Y columns
geom = gpd.points_from_xy(x=df['Longitude'],y=df['Latitude'])
#Convert to a geodataframe, setting the CRS to WGS 84 (wkid=4326)
gdf_stations_all = gpd.GeoDataFrame(df,geometry=geom,crs=4326)
Have a quick look at the contents imported. Things to check include:
#Show the number rows and columns
gdf_stations_all.shape
(1227, 11)
#Examine the structure of the dataframe
gdf_stations_all.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 1227 entries, 0 to 1226 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 1227 non-null int64 1 Fuel Type Code 1227 non-null object 2 Station Name 1227 non-null object 3 City 1227 non-null object 4 State 1227 non-null object 5 ZIP 1227 non-null object 6 Status Code 1227 non-null object 7 Latitude 1227 non-null float64 8 Longitude 1227 non-null float64 9 Facility Type 450 non-null object 10 geometry 1227 non-null geometry dtypes: float64(2), geometry(1), int64(1), object(7) memory usage: 105.6+ KB
#Examine a single record of data
gdf_stations_all.iloc[0]
ID 39016 Fuel Type Code ELEC Station Name City of Raleigh - Municipal Building City Raleigh State NC ZIP 27601 Status Code E Latitude 35.778416 Longitude -78.64347 Facility Type STREET_PARKING geometry POINT (-78.64347 35.778416) Name: 0, dtype: object
#Reveal the geometry type(s) contained inthe geodataframae
gdf_stations_all['geometry'].type
#gdf_stations_all['geometry'].type.unique()
array(['Point'], dtype=object)
#Plot the data
gdf_stations_all.plot().grid()
→ Could you performm the same steps of importing and exploring the USGS gage locations in NC stored in the CSV file ../data/gages.csv?
gdf_gagesWe will explore web services a bit later, but we'll use the code below to acquire polygon data of census block groups for Durham, Wake, and Orange counties from an NC OneMap Web Service. Once imported, we'll merge these geodataframes together and use them in our subsequet analyses.
(We'll examine how this works a bit later...)
#Create a function to read NCOneMap feature services into a geodataframe
def getBlockGroupData(FIPS):
#Construct the url from the function arguments
url=f'https://services.nconemap.gov/secure/rest/services/NC1Map_Census/FeatureServer/8/query?' + \
f"where=GEOID10+LIKE+'{FIPS}%'&outFields=GEOID10,TOTAL_POP&f=geojson"
#Create a geodataframe from the URL
gdf = gpd.read_file(url)
#Return the geodataframe
return gdf
#Fetch census block groups for Durham, Orange, and Wake counties using the above function
gdf_DurmBlkGroups = getBlockGroupData(37063)
gdf_WakeBlkGroups = getBlockGroupData(37183)
gdf_OrangeBlkGroups = getBlockGroupData(37135)
#Challenge: See if you can fetch Chatham county block groups (FIPS = 37037)
gdf_Chatam = getBlockGroupData(37037)
#Show the Durham block group geodataframe's coordinate reference system
gdf_DurmBlkGroups.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
#Explore the Durham block group geodataframe's columns...
gdf_DurmBlkGroups.columns
Index(['geoid10', 'total_pop', 'geometry'], dtype='object')
#Examine a sample record from the geodataframe
gdf_DurmBlkGroups.iloc[10]
geoid10 370630007002 total_pop 1030 geometry POLYGON ((-78.9175097042526 35.97670311102937,... Name: 10, dtype: object
gdf_DurmBlkGroups.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 153 entries, 0 to 152 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geoid10 153 non-null object 1 total_pop 153 non-null int64 2 geometry 153 non-null geometry dtypes: geometry(1), int64(1), object(1) memory usage: 3.7+ KB
#Plot Durham's population, setting the colormap to "viridis"
gdf_DurmBlkGroups.plot(column='total_pop',cmap='magma')
<AxesSubplot: >
#Plot the block groups for all three counties
thePlot = gdf_DurmBlkGroups.plot(color='Blue')
#First, plot one geodataframe, saving the output as "thePlot"
gdf_WakeBlkGroups.plot(ax=thePlot,color='Red')
#Plot another geodataframe, telling it to use the axes in "thePlot" created above
gdf_OrangeBlkGroups.plot(ax=thePlot,color="Lightblue")
#Repeat...
<AxesSubplot: >
Now that we've obtained a few datasets and got them into geodataframes, we'll perform some analysis. These include:
Doc: https://geopandas.org/indexing.html
Subsetting features in a geodataframe uses the same methods as subsetting recordsin a Pandas dataframe. Here we'll run through an example by subsetting EV stations found oly within Durham, Raleigh, and Chapel Hill.
City column#Reveal the unique values in the City column
gdf_stations_all['City'].unique()
array(['Raleigh', 'Concord', 'Fayetteville', 'High Point', 'Lumberton',
'Durham', 'Charlotte', 'Sanford', 'Burlington', 'Asheville',
'Forest City', 'Gastonia', 'Goldsboro', 'Greensboro', 'Greenville',
'Hendersonville', 'Hickory', 'Jacksonville', 'Montreat',
'New Bern', 'Reidsville', 'Roanoke Rapids', 'Rockingham',
'Mt. Airy', 'Salisbury', 'Southern Pines', 'Statesville',
'Wake Forest', 'Wilkesboro', 'Wilmington', 'Wilson',
'Winston-Salem', 'Clyde', 'Cornelius', 'Elizabeth City',
'Asheboro', 'Black Mountain', 'Boone', 'Cary', 'Chapel Hill',
'Hillsborough', 'Cherokee', 'Flat Rock', 'Clinton', 'Knightdale',
'Mt. Holly', 'Mooresville', 'Weaverville', 'Fletcher', 'Pittsboro',
'Belmont', 'Lowell', 'Dallas', 'Lincolnton', 'Arden',
'Kernsersville', 'Mebane', 'Point Harbor', 'Nags Head',
'Huntersville', 'Ocean Isle Beach', 'Holden Beach', 'Lenoir',
'Waynesville', 'Maggie Valley', 'Granite Falls', 'Hot Springs',
'West Jefferson', 'Cullowhee', 'Graham', 'Indian Trail',
'Kings Mountain', 'Old Fort', 'Henderson', 'Garner', 'Roxboro',
'Castle Hayne', 'Star', 'Carrboro', 'Brevard', 'Archdale', 'Sylva',
'Dillsboro', 'Kinston', 'Oriental', 'Monroe', 'Pembroke',
'Lake Lure', 'Calabash', 'Shallotte', 'Research Triangle Park',
'Cherokee,', 'Gibsonville', 'Elkin', 'Dobson', 'Davidson',
'Franklin', 'Rocky Mount', 'Wallace', 'Hatteras', 'Mills River',
'Banner Elk', 'Beaufort', 'Benson', 'Blowing Rock', 'Buxton',
'Cape Carteret', 'Cashiers', 'Duck', 'Elon', 'Highlands',
'Kannapolis', 'Kill Devil Hills', 'Manteo', 'Murphy', 'Newton',
'Pinehurst', 'Robbinsville', 'Saluda', 'Selma', 'Southport',
'Tryon', 'Whitsett', 'Lexington', 'Shelby', 'Burgaw', 'Matthews',
'Plymouth', 'Madison', 'Avon', 'Fuquay Varina', 'Moncure',
'Mt Airy', 'King', 'Smithfield', 'Dunn', 'Lilesville', 'Raeford',
'Snow Hill', 'Evergreen', 'Edenton', 'Dudley', 'Sneads Ferry',
'Aulander', 'Mcleansville', 'Lillington', 'Sunset Beach',
'Bessemer City', 'Leland', 'Conover', 'Emerald Isle', 'Swansboro',
'Aberdeen', 'Marion', 'Enfield', 'Nashville', 'Warrenton',
'Powells Point', 'Hamptonville', 'Winterville', 'Linville', 'Apex',
'Cherry Point', 'Morrisville', 'Butner', 'SOUTHPORT', 'Haw River',
'Colfax', 'Halifax', 'Mocksville', 'Albemarle', 'Sparta',
'Troutman', 'Hampstead', 'Siler City', 'Tarboro', 'Sugar Grove',
'Cramerton', 'Hope Mills', 'Kitty Hawk', 'Robbins', 'Harrisburg',
'Spindale', 'Yadkinville', 'Pilot Mountain', 'Ellerbe',
'Burnsville', 'Scotland Neck', 'Rose Hill', 'Elizabethtown',
'Warsaw', 'Waxhaw', 'Pisgah Forest', 'Sugar Mountain', 'Littleton',
'Mars Hill', 'Marshville', 'Holly Springs', 'Louisburg',
'Oak Island', 'Washington', 'Bakersville', 'Windsor', 'Candler',
'Rich Square', 'Thomasville', 'Jackson', 'Troy', 'Morehead City',
'Kernersville', 'Oxford', 'Corolla', 'Rolesville', 'Whiteville',
'Jamestown', 'Mint Hill'], dtype=object)
#Subset records where the City is "Durham", "Raleigh", or "Chapel Hill"
gdf_stations = gdf_stations_all.query('City in ("Raleigh", "Durham", "Chapel Hill")')
gdf_stations.shape
(205, 11)
#Recall, an alternative is to use masks...
gdf_stations = gdf_stations_all.loc[
(gdf_stations_all['City'] == 'Raleigh')|
(gdf_stations_all['City'] == 'Durham')|
(gdf_stations_all['City'] == 'Chapel Hill')]
gdf_stations.shape
(205, 11)
#Another, more efficient mask using `isin`
gdf_stations = gdf_stations_all.loc[gdf_stations_all['City'].isin(("Raleigh", "Durham", "Chapel Hill"))]
gdf_stations.shape
(205, 11)
#Plot the results
myPlot = gdf_DurmBlkGroups.plot()
gdf_stations.plot("City",ax=myPlot);
#Plot them with a base map (using Contextily - more later...)
city_plot = gdf_stations.to_crs(3857).plot(column="City",figsize=(10,10),alpha=0.6)
ctx.add_basemap(city_plot)
Doc: https://geopandas.org/mergingdata.html
append() functionWe'll start by appending the Wake Co. dataset to the Durham Co. one. Then you will append the Orange Co. dataframe to that product.
→If the were different, we could use to_crs() to transform one dataset to use the crs of the other
#Check the crs of the two geodataframes
gdf_DurmBlkGroups.crs == gdf_WakeBlkGroups.crs
True
#Add a field to each input, setting values to identify the source dataset
gdf_DurmBlkGroups['County'] = "Durham"
gdf_WakeBlkGroups['County'] = "Wake"
#Append the Wake Co features to the Durham Co features,
gdf_BlkGrp_step1 = gdf_DurmBlkGroups.append(gdf_WakeBlkGroups)
C:\Users\jo158\AppData\Local\Temp\ipykernel_9068\4223542252.py:2: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. gdf_BlkGrp_step1 = gdf_DurmBlkGroups.append(gdf_WakeBlkGroups)
#Check to see that the total rows in the merged gdf match the sum of the two component gdfs
gdf_DurmBlkGroups.shape[0] + gdf_WakeBlkGroups.shape[0] == gdf_BlkGrp_step1.shape[0]
True
#Plot the result
gdf_BlkGrp_step1.plot()
<AxesSubplot: >
Append the Orange Co blockgroup features to the gdf_BlkGrp_step geodataframe we just created.
Remember to:
gdf_OrangeBlkGroups, setting its value to the County name.→ Save the result as gdf_BlkGrps
#Check that the coordinate refernce systems are the same
gdf_BlkGrp_step1.crs == gdf_OrangeBlkGroups.crs
True
#Add the county field
gdf_OrangeBlkGroups['County'] = "Orange"
#Append the geodataframes
gdf_BlkGrp = gdf_BlkGrp_step1.append(gdf_OrangeBlkGroups)
C:\Users\jo158\AppData\Local\Temp\ipykernel_9068\3951118901.py:2: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. gdf_BlkGrp = gdf_BlkGrp_step1.append(gdf_OrangeBlkGroups)
#Plot the output
gdf_BlkGrp.plot(column='County')
<AxesSubplot: >
We have Social Vulnerability Data to examine in our analysis, but these data are at the Tract, not BlockGroup level. Thus, to join these attributes to our geodataframe, we'll need to aggregate our blockgroups to the tract level. Fortunately, the GEOID10 attribute is structured such that the census tract is just the first 11 characters. So we will create a new column holding these first 11 characters, and then we'll dissolve our blockgroup features sharing the same tract ID to single features.
Doc: https://geopandas.org/aggregation_with_dissolve.html
gdf_BlkGrp.iloc[0]
geoid10 370630003023 total_pop 1533 geometry POLYGON ((-78.89598269934186 36.00892711836401... County Durham Name: 0, dtype: object
#Create the Tract column
gdf_BlkGrp['TRACT']=gdf_BlkGrps['geoid10'].str[0:11]
gdf_BlkGrp.head()
| geoid10 | total_pop | geometry | County | TRACT | |
|---|---|---|---|---|---|
| 0 | 370630003023 | 1533 | POLYGON ((-78.89598 36.00893, -78.89597 36.010... | Durham | 37063000302 |
| 1 | 370630003021 | 779 | POLYGON ((-78.89606 36.01699, -78.89708 36.017... | Durham | 37063000302 |
| 2 | 370630003022 | 1114 | POLYGON ((-78.90180 36.01289, -78.90217 36.012... | Durham | 37063000302 |
| 3 | 370630005003 | 1289 | POLYGON ((-78.92329 35.99672, -78.92398 35.995... | Durham | 37063000500 |
| 4 | 370630004011 | 898 | POLYGON ((-78.93610 36.02206, -78.93612 36.021... | Durham | 37063000401 |
#Dissolve features on tract, computing summed population
gdf_Tract = gdf_BlkGrp.dissolve(by='TRACT',aggfunc={'total_pop':'sum',
'County':'first'})
#Show the results
gdf_Tract.head()
| geometry | total_pop | County | |
|---|---|---|---|
| TRACT | |||
| 37063000101 | POLYGON ((-78.87411 36.02349, -78.87417 36.023... | 3152 | Durham |
| 37063000102 | POLYGON ((-78.89264 36.01976, -78.89268 36.018... | 4535 | Durham |
| 37063000200 | POLYGON ((-78.88339 36.00374, -78.88394 36.003... | 2946 | Durham |
| 37063000301 | POLYGON ((-78.91097 36.01230, -78.91102 36.011... | 2504 | Durham |
| 37063000302 | POLYGON ((-78.89598 36.00893, -78.89632 36.008... | 3426 | Durham |
#Plot the data
gdf_Tract.plot(column='total_pop');
Now that we have the data at the tract level, we can join the Social Vulnerability Index data, stored in a CSV file (./data/NC_SVI_2018.csv).
Doc: https://geopandas.org/mergingdata.html#attribute-joins
#Import and explore the SVI data
df_SVI = pd.read_csv('../data/NC_SVI_2018.csv', dtype={'FIPS':'str',
'ST':'str',
'STCNTY':'str'})
Challenge:
→ Modify theread_csv()command above so that 'ST' and 'STCNTY' are also imported as strings.
#Plot a histogram of the SVI values
df_SVI['SVI'].hist();
#Create a mask of values greater than or equal to zero
valid_mask = df_SVI['SVI'] >= 0
#Apply that mask
df_SVI_fixed = df_SVI.loc[valid_mask]
#View the histogram again
df_SVI_fixed['SVI'].hist()
<AxesSubplot: >
Phew! Exploring the data payed off!
#Have a look at the merge command syntax
gdf_Tract.merge?
#Join the SVI data to the tract features
gdf_Tract_joined = gdf_Tract.merge(df_SVI_fixed,
left_on='TRACT',
right_on='FIPS',
how='left')
#Examine the output
gdf_Tract_joined.head()
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | POLYGON ((-78.87411 36.02349, -78.87417 36.023... | 3152 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000101 | 0.8080 |
| 1 | POLYGON ((-78.89264 36.01976, -78.89268 36.018... | 4535 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000102 | 0.8547 |
| 2 | POLYGON ((-78.88339 36.00374, -78.88394 36.003... | 2946 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000200 | 0.5706 |
| 3 | POLYGON ((-78.91097 36.01230, -78.91102 36.011... | 2504 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000301 | 0.7048 |
| 4 | POLYGON ((-78.89598 36.00893, -78.89632 36.008... | 3426 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000302 | 0.3392 |
#Explore the output, look form null values
gdf_Tract_joined.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 275 entries, 0 to 274 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 geometry 275 non-null geometry 1 total_pop 275 non-null int64 2 County 275 non-null object 3 ST 270 non-null object 4 STATE 270 non-null object 5 STCNTY 270 non-null object 6 COUNTY 270 non-null object 7 FIPS 270 non-null object 8 SVI 270 non-null float64 dtypes: float64(1), geometry(1), int64(1), object(6) memory usage: 21.5+ KB
#Plot the output
gdf_Tract_joined.plot(column='SVI')
<AxesSubplot: >
Looks like we have some features missing SVI data. Let's examine those more closely.
#Create a mask of null SVI values
null_mask = gdf_Tract_joined['SVI'].isnull()
null_mask.sum() # gives the number of null values
5
#Apply the mask
gdf_Tract_joined.loc[null_mask]
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | |
|---|---|---|---|---|---|---|---|---|---|
| 20 | POLYGON ((-78.91595 36.00978, -78.91718 36.009... | 1941 | Durham | NaN | NaN | NaN | NaN | NaN | NaN |
| 59 | POLYGON ((-78.88307 35.88604, -78.88532 35.886... | 4 | Durham | NaN | NaN | NaN | NaN | NaN | NaN |
| 79 | POLYGON ((-79.04624 35.91339, -79.04606 35.913... | 2350 | Orange | NaN | NaN | NaN | NaN | NaN | NaN |
| 273 | POLYGON ((-78.80909 35.88187, -78.80944 35.881... | 2 | Wake | NaN | NaN | NaN | NaN | NaN | NaN |
| 274 | POLYGON ((-78.74827 35.87112, -78.74843 35.871... | 30 | Wake | NaN | NaN | NaN | NaN | NaN | NaN |
We can either assign a value to these missing values or leave them as no data. We'll just leave them blank for now...
Our combined dataframes have a field indicating the total population in each block group. We want to compute population density from this and from the area of each tract. We don't yet have an area field in our dataframe, but we can compute that from our spatial features. But before we can do this, we need to transform our data into a projected coordinate system. So... the steps for this analysis include:
Area_km2 column in our dataframePopDens column in our dataframe by dividing TOTAL_POP by Area_km#Project the data to UTM Zone 17N (EPSG 32617)
gdf_Tract_utm = gdf_Tract_joined.to_crs(32617)
gdf_Tract_utm.head()
| geometry | total_pop | County | ST | STATE | STCNTY | COUNTY | FIPS | SVI | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | POLYGON ((691558.001 3988644.329, 691553.732 3... | 3152 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000101 | 0.8080 |
| 1 | POLYGON ((689897.492 3988193.797, 689896.525 3... | 4535 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000102 | 0.8547 |
| 2 | POLYGON ((690769.570 3986435.083, 690720.109 3... | 2946 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000200 | 0.5706 |
| 3 | POLYGON ((688263.171 3987331.051, 688261.050 3... | 2504 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000301 | 0.7048 |
| 4 | POLYGON ((689621.976 3986986.040, 689591.484 3... | 3426 | Durham | 37 | NORTH CAROLINA | 37063 | Durham | 37063000302 | 0.3392 |
Area_km2 column in our dataframe#Compute a new column of geometry area (in sq km)
gdf_Tract_utm['Area_km2'] = gdf_Tract_utm['geometry'].area/1000000
PopDens column in our dataframe by dividing TOTAL_POP by Area_km#Compute a new column of population density
gdf_Tract_utm['PopDens'] = gdf_Tract_utm['total_pop']/gdf_Tract_utm['Area_km2']
#Plot the distribution of areas
#gdf_Tract_utm['Area_km2'].hist(bins=20)
gdf_Tract_utm['PopDens'].hist(bins=20)
<AxesSubplot: >
#Plot a map of log tranformed population density
gdf_Tract_utm.plot(column='PopDens');
#Log transform the pop_dens data
import numpy as np
gdf_Tract_utm['PD_log'] = np.log(gdf_Tract_utm['PopDens'])
#Plot the log-transformed distribution of areas
gdf_Tract_utm['PD_log'].hist(bins=20)
<AxesSubplot: >
#Plot a map of log tranformed population density
gdf_Tract_utm.plot(column='PD_log');
Doc: https://geopandas.org/set_operations.html
Previously, we subset EV stations by an attribute (City). Here we'll see how we can instead select features spatially. We do this with GeoPanda's Overlay operations.
To spatially select features:
overlay:intersection operation to select EV features overlap with the Tracts dataset#Ensure both datasets share the same crs
# gdf_Tract_utm.crs == gdf_stations_all.crs equal to FALSE
gdf_Tract_utm.crs.to_string(),gdf_stations_all.crs.to_string()
('EPSG:32617', 'EPSG:4326')
#Project one dataset to match the other
gdf_stations_all_utm = gdf_stations_all.to_crs(gdf_Tract_utm.crs)
#plot points on tracts
myplot = gdf_Tract_utm.plot(figsize=(12,6),alpha=0.6,color='gray')
gdf_stations_all_utm.plot(ax=myplot);
#Intersect the two dataframes
gdf_stations_select = gpd.overlay(df1 = gdf_stations_all_utm,
df2 = gdf_Tract_utm,
how = 'intersection')
#View the data
gdf_stations_select.head()
| ID | Fuel Type Code | Station Name | City | State | ZIP | Status Code | Latitude | Longitude | Facility Type | ... | ST | STATE | STCNTY | COUNTY | FIPS | SVI | Area_km2 | PopDens | PD_log | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39016 | ELEC | City of Raleigh - Municipal Building | Raleigh | NC | 27601 | E | 35.778416 | -78.643470 | STREET_PARKING | ... | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 2.145192 | 1315.96601 | 7.182326 | POINT (713000.165 3961933.778) |
| 1 | 39017 | ELEC | City of Raleigh - Downtown | Raleigh | NC | 27601 | E | 35.774350 | -78.642287 | STREET_PARKING | ... | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 2.145192 | 1315.96601 | 7.182326 | POINT (713117.970 3961485.268) |
| 2 | 40751 | ELEC | City of Raleigh - Wilmington Station Deck | Raleigh | NC | 27601 | E | 35.779370 | -78.638203 | PAY_GARAGE | ... | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 2.145192 | 1315.96601 | 7.182326 | POINT (713473.768 3962051.084) |
| 3 | 42396 | ELEC | Raleigh Municipal Building Deck | Raleigh | NC | 27601 | E | 35.779082 | -78.641779 | MUNI_GOV | ... | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 2.145192 | 1315.96601 | 7.182326 | POINT (713151.258 3962011.344) |
| 4 | 42397 | ELEC | City of Raleigh | Raleigh | NC | 27601 | T | 35.777283 | -78.639281 | STREET_PARKING | ... | 37 | NORTH CAROLINA | 37183 | Wake | 37183050100 | 0.3077 | 2.145192 | 1315.96601 | 7.182326 | POINT (713381.895 3961817.202) |
5 rows × 22 columns
#Plot the data
#Construct the first (lowest layer) of the data to plot, saving it as "the_plot"
the_plot = gdf_Tract_utm.plot(
color='lightgrey', #Set the fill of the Tract features
edgecolor='grey', #Set the edge color...
alpha=0.4, #Set the transparency...
figsize=(12,12)) #Set the size of the figure
#Plot the stations, setting them to share the axes of "the_plot"
gdf_stations_select.plot(
ax=the_plot, #Draw it on the plot created above
column='SVI', #Color the features by values in the City column
markersize=45, #Set the size of the markers
edgecolor='white'); #Set the edge color of the markers
#Use Contextily to add a nice backdrop
ctx.add_basemap(
ax = the_plot, #Add it to our existing plot
crs=gdf_Tract_utm.crs, #Transform the background to match the data's crs
source=ctx.providers.CartoDB.Voyager) #Set the type of backdrop
Doc: https://geopandas.org/mergingdata.html#spatial-joins
Now that we have a proper susbset of EV stations, let's add demographic data to our EV locations by peforming a spatial join with the tract geodataframe.
→ What would happen if we made a typo and only included a single equals sign??
#Compare crs
gdf_stations_select.crs == gdf_Tract_utm.crs
True
#Execute the spatial join
gdf_JoinedData = gpd.sjoin(left_df = gdf_stations_select,
right_df = gdf_Tract_utm,
how = 'left',
lsuffix = 'ev',
rsuffix = 'tracts')
#View the data
gdf_JoinedData.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 313 entries, 0 to 312 Data columns (total 34 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 313 non-null int64 1 Fuel Type Code 313 non-null object 2 Station Name 313 non-null object 3 City 313 non-null object 4 State 313 non-null object 5 ZIP 313 non-null object 6 Status Code 313 non-null object 7 Latitude 313 non-null float64 8 Longitude 313 non-null float64 9 Facility Type 106 non-null object 10 total_pop_ev 313 non-null int64 11 County_ev 313 non-null object 12 ST_ev 305 non-null object 13 STATE_ev 305 non-null object 14 STCNTY_ev 305 non-null object 15 COUNTY_ev 305 non-null object 16 FIPS_ev 305 non-null object 17 SVI_ev 305 non-null float64 18 Area_km2_ev 313 non-null float64 19 PopDens_ev 313 non-null float64 20 PD_log_ev 313 non-null float64 21 geometry 313 non-null geometry 22 index_tracts 313 non-null int64 23 total_pop_tracts 313 non-null int64 24 County_tracts 313 non-null object 25 ST_tracts 305 non-null object 26 STATE_tracts 305 non-null object 27 STCNTY_tracts 305 non-null object 28 COUNTY_tracts 305 non-null object 29 FIPS_tracts 305 non-null object 30 SVI_tracts 305 non-null float64 31 Area_km2_tracts 313 non-null float64 32 PopDens_tracts 313 non-null float64 33 PD_log_tracts 313 non-null float64 dtypes: float64(10), geometry(1), int64(4), object(19) memory usage: 85.6+ KB
#Plot
the_plot = gdf_Tract_utm.plot(
color='lightgrey',
edgecolor='grey',
alpha=0.4,
figsize=(12,12))
gdf_JoinedData.plot(
ax=the_plot,
column='SVI_ev',
markersize=60,
edgecolor='white');
ctx.add_basemap(
ax=the_plot,
crs=gdf_Tract_utm.crs,
source=ctx.providers.CartoDB.Voyager)
#Make some graphical plots
ax=pd.DataFrame(gdf_JoinedData).boxplot(
column='SVI_ev',
by='City',
rot=45,
figsize=(19,5));
With this document we now have a fully reproducible analytic workflow, complete with visualizations of our result. We can export this notebook as an HTML document and share that, or if we commit this document to our GitHub account and share the link to that notebook.
3.1.1
Export your completed notebook as an HTML document.
View it in a web browser
3.1.2
Commit the changes to your forked repository
Navigate to https://nbviewer.jupyter.org/ and paste your repository's URL
Share this link with others...
We can also save the resulting geodataframes as feature classes for more analysis, either in Python or in ArcGIS Pro.
We can export our gdf_JoinedData geodataframe easily,either as a shapefile or a CSV file.
#Export the geodataframe to a shapefile
gdf_JoinedData.to_file(
filename='../data/EV_sites_with_data.shp',
driver='ESRI Shapefile',
index=False
)
C:\Users\jo158\AppData\Local\Temp\ipykernel_9068\3809339296.py:2: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile. gdf_JoinedData.to_file(
#Export the geodataframe to a CSV file
pd.DataFrame(gdf_JoinedData).to_csv(
'../data/my_saved_data.csv',
index=False
)